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Let ^{E) be the space of probability measures on a measurable space {E,£). In this paper 

we introduce a class of nonlinear Markov chain Monte Carlo (MCMC) methods for simulating 

from a probability measure n G S^(E\ Nonlinear Markov kernels (see [Feynman-Kac Formulae: 

Genealogical and Interacting Particle Systems with Applications (2004) Springer]) K:3^{E) x 

^ ' _E — > ^{E) can be constructed to, in some sense, improve over MCMC methods. However, such 

V^ I nonlinear kernels cannot be simulated exactly, so approximations of the nonlinear kernels are 

^T ' constructed using auxiliary or potentially self-interacting chains. Several nonlinear kernels are 

^^ ' presented and it is demonstrated that, under some conditions, the associated approximations 

^ j , exhibit a strong law of large numbers; our proof technique is via the Poisson equation and 

f"~>- ' Foster-Lyapunov conditions. We investigate the performance of our approximations with some 

c 2 ^ [ simulations. 
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1. Introduction 

Monte Carlo simulation is one of the most important elements of computational statistics. 
This is because of its relative simplicity and computational convenience in constructing 
estimates of high-dimensional integrals. That is, for a 7r-integrable f : E ^W, we approx- 
imate: 

Af) ■■= f fi^Mdx) (1.1) 

Je 
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by 

n 



i=0 



where S^ (du) := :;ji:i X]r=o'^^i('^'") ^^ ^^e empirical measure based upon random vari- 
ables {Xk}o<k<n drawn from n. Such integrals appear routinely in Bayesian statistics, 
in terms of posterior expectations; see [26] and the references therein. In those cases, E 
is often of very high dimension and complex simulation methods such as MCMC [26] and 
sequential Monte Carlo (SMC) [10, 13] need to be used. 

It has long been known by Monte Carlo specialists that standard MCMC algorithms 
often have difficulties in simulating from complicated distributions - for example, when 
the target n exhibits multiple modes and/or possesses strong dependencies between sub- 
components of X. In the former case, the Markov chain can take an unreasonable amount 
of time to jump between these modes and the estimates of (1.1) are very inaccurate. 

As a result, there have been a large number of alternative methods proposed in the 
literature; we detail some of them here. Many of these approaches have relied upon 
MCMC techniques such as adaptive MCMC [5, 20], which, in some instances, attempts to 
improve the mixing properties of the transition kernel by using the information learned in 
the past. In addition, there arc methods that rely upon the simulation of parallel Markov 
chains [16] and genetic algorithm type moves; see [22] for a review. These latter methods 
use the idea of running some of the parallel chains with invariant probability measure rj, 
where 77 is easier to explore and is related to tt; hence the samples of the parallel chains 
can provide valuable information for simulating from tt. Extensions to MCMC-based 
simulation methods have combined MCMC with SMC ideas, see, for example, [2, 11]. 
Such approaches are often more flexible than MCMC. 

In this paper, we consider another alternative: nonlinear MCMC via auxiliary or self- 
interacting approximations. Such methods rely primarily upon the ideas of MCMC. How- 
ever, it is demonstrated below that the auxiliary/self-interacting approximation idea is 
similar to that of approximating Feynman-Kac formulae [10] and as such is linked to 
SMC methodology. It should be noted that related ideas have appeared, directly in [9] 
and indirectly in [23]; see [4, 7] for some theoretical analysis. Subsequent to the first 
versions of this work [3] a variety of related articles have appeared: [6-8]; we cite these 
where appropriate, but note the substantial overlap between our work and these papers. 



1.1. Nonlinear Markov kernels via interacting approximations 

Standard MCMC algorithms rely on Markov kernels of the form K:E^>- S^[E). These 
Markov kernels are linear operators on ^{E); that is, /i(dj/) = J„^{dx)K{x,dy), where 
/i, ^ G ^{E). A nonlinear Markov kernel K : ,^{E) x E ^ 3^{E) is defined as a nonlinear 
operator on the space of probability measures. Nonlinear Markov kernels, i^^, can often 
be constructed to exhibit superior mixing properties to ordinary MCMC versions. For 
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example, let 

K^{x, dy) = (1 - e) A-(x, Ay) + e ( ^l{dz)K{z, dy), (1.2) 

JE 

where i^T is a Markov kernel of invariant distribution tt, e e (0, 1) and /i G ^{E). Simu- 
lating from Kt^ is clearly desirable as we allow regenerations from tt, with K^^ strongly 
uniformly ergodic (see [27]). However, in most cases, it is not possible to simulate from 
Kt^ and, instead, an approximation is proposed. 

A self-interacting Markov chain (see [12]) generates a stochastic process {^n}ri>o that 
is allowed to interact with values realized in the past. That is, we might approximate, at 
time n+1, n hy S^ . This process corresponds to generating a value from the history of 
the process, and then a mutation step, via the kernel K. In practice, the self-interaction 
can lead to very poor algorithmic performance [3]; an auxiliary Markov chain is used to 
approximate the nonlinear kernel. 



1.2. Motivation and structure of the paper 



In the context of stochastic simulation, self-interacting Markov chains (SIMCs), or IMCs, 
can be thought of as storing modes and then allowing the algorithm to return to them 
in a relatively simple way. Parametric adaptive MCMC can be thought of as an indirect 
application of this idea, where parameters of the kernel are optimized via a stochastic 
approximation algorithm. This approach does not retain all of the features of previously 
visited states. In other words, SIMCs can be considered as a nonparametric, or infinite- 
dimensional, generalization of parametric adaptive MCMC. It is thus the attractive idea 
of being able to fully exploit the information provided by the previous samples that has 
motivated us to investigate such algorithms. 

This paper is structured as follows. We begin by giving our notation in Section 2. In 
Section 3 our simulation methods are described and several nonlinear Markov kernels and 
self-interacting approximations are introduced. In Section 4 we introduce some assump- 
tions and some preliminary results, which are used to prove a strong law of large numbers 
(SLLN). In Sections 5 and 6, some technical proofs and the SLLN are presented; this is 
for a particular nonlinear kernel introduced in Section 3. This analysis is of interest from 
a theoretical point of view: it brings together the literature of measure- valued processes 
and interacting particle systems [10] used in SMC and the relatively recent literature on 
general state space Markov chains [25] used in MCMC. In Section 7 some algorithms are 
investigated; our assumptions are verified and some parameter settings are investigated 
for a toy example. In Section 8 some extensions to our ideas are discussed. The proofs 
are all given in the Appendices. 
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2. Notation and definitions 
2.1. Notation 

2.1.1. Probability and measure 

Define a measurable space {E,£). Throughout, £ will be assumed countably generated. 
^(M'^'), fc € N is used to represent the Borel sets with Lebesgue measure denoted by dx. 
For a stochastic process {Xn}n>o on {E^,£'^^), Q^ = a{Xo,. . .,Xn) denotes the nat- 
ural filtration. P^ is taken as a probability law of a stochastic process with initial dis- 
tribution n and E^ the associated expectation. If fx = S^, with 6 the Dirac measure, 
Px (resp.. Ex) is used instead of P^^ (resp., E^^). For fi £ ^{E), the product measure is 
written /x x /i = /i®^, with a clear generalization to higher order products. For measurable 

f■.E^R,^lif)^J^fix)^lidx). 

If a CT-finite measure n is dominated by another rj (denoted ti ■^•q), the Radon- 
Nikodym derivative is written with the same notation (e.g., if tt <^ 77, then 'n(x)/rj{x) = 
d7r/d77(a:)). For cr-finite measures tt and 77, -k ^ i] denotes mutual absolute continuity. 

2.1.2. Markov chains 

Let (£', £) be a measurable space. Throughout for a Markov transition kernel K -.E ^ 
^{E) the following standard notation is used: for measurable f:E—>M., K{f){x) := 
Jj,f{y)K{x,dy) and for /. e ^{E) fiK{f) := Jj,K{f){x)fiidx). 

For Kf^, K:Ex 3^{E) — > 3^{E), given its existence, we will denote by cj(^) 
{llj:Si^{E) — > 3^{E)) the invariant distribution of this Markov kernel. Recall that the 
empirical measure of an arbitrary stochastic process (£''^,f^, {X„}„>o,P) is defined, at 
time n, as 



n 

S^{du):=^Y.^xMu)- (2.1) 

n + 1 ^ — ' 



Throughout this paper, we are concerned with two nonlinear kernels of the form 
X^(x,d2/) = (1 - e)K{x,dy) + e$(M)(dy), 

where K:E^^{E), F:Ex ^{E) -^ 3^{E) (see [10] for more on $) and 

K^,{x,dy) = {I - €)K{x,dy) + eQ ^{x,dy), 

(2.2) 

Q^.{f){x) = / lJ^{du)a[x, u)[,f{u) - f{x)] + f{x), 



J E 

where a{x,u) is defined later on. 
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2.1.3. Norms 

For any fc e N, the Euclidean norm of x G M'^ is denoted \x\. For f:E^)-W^, n 6 N, 
|/|oo := supj.^^ |/(a;)|. For /:£'-> M" the Lp-norm is defined, assuming it exists, as 
(/g|/(x)|PdAt)i/P ior nG^{E). For F:^ -^ [l,oo) and /:£; -^M" 

l/|v:=sup-— — . 

^y is the class of functions f -.E ^ R" such that |/|y < cx). We also use the notions of 
the y-total variation for a signed measure 

||A|K.:= sup |A(/)|, 
\f\<v 

and the l/-norm operator between two kernels Ki,K2 : E — > 3^{E) 

IIIR- ?<- Ill • „„„ ||J^i(a:,-)-ii:2(a:,-)||y 

lll^i - K2\\\v ■= sup — — . 

xeE V[x) 

2.1.4. Miscellaneous 

The notation a V 5 := max{a,6} (resp., a A 5 := min{a,6}) is adopted. The indicator 
function of A C -E is written Ia{x). Nq = NU{0}. Throughout the paper we denote a gene- 
ric finite constant as M, that is, the value of M may change from line to line in the proofs 
and is local to each proof. 

3. Nonlinear MCMC 

3.1. Nonlinear Markov kernels 

Nonlinear MCMC can be characterised by the following procedure: 

• Identify a nonlinear kernel that admits n as an invariant distribution and can be 
expected to mix faster than an ordinary MCMC kernel; for example, (1.2). 

• Construct a stochastic process that approximates the kernel, which can be simulated 
in practice. 

Based upon the previous work [3], we consider auxiliary stochastic processes to ap- 
proximate the nonlinear kernel. That is, it has been found in [3] that using the past 
history to approximate the nonlinear kernel leads to very poor performance. All of the 
processes that arc simulated in this paper use an auxiliary Markov chain to approximate 
the nonlinear kernel. The difficulty is then to design sensible nonlinear kernels that may 
lead to good empirical performance. The two kernels we have designed are below. 
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3.2. Selection/mutation with potential 

Let P be an MCMC kernel of invariant distribution t], and assume tt <C rj. Let g{v) ~ 2w 
and set K to be an MCMC kernel of invariant distribution tt. Consider the nonlinear 
kernel 

i^^(.T,d.T') = (1 - e)i^(.T,d.T') + e$(Ai)(dx'); 

clearly, ii fi = r], then one has tt/^^ — n. 

If it is possible to sample exactly from ry, then one could sample exactly from Krf. 
However, for efficient algorithms, this will not be the case. The following approximation 
is adopted at time-step n + 1 of the simulation: 

[(1 - e)K{xn,dx„+i) + e^{S^){dxn+i)]P{yn,dyn+i); 

that is, we are 'feeding' the chain {X„}„>o the empirical measure S^ . Intuitively, as n 
grows large, S^{f) — > r]{f) and one samples from the original kernel of interest. 

3.3. Auxiliary self-interaction with genetic moves 

For any /i 6 ,^{E) we define a nonlinear Markov kernel Q^ : i^{E) x E ^ i^{E) 

QM)i^)^ I li{du)a{x,u)[f[u)~.f{x)]+f{x) 

J E 

and for tt ^ rj 

TTiy)r]{x) 



a{x,y) = lA- 



Tr{x)rj{y) ' 

The idea here is to generate a sample from /x and accept or reject it as the new state 
on the basis of the probability a. Clearly, ttQt; = tt. Letting K and P be as above, the 
process is simulated according to 

{(1 -e)K{Xn,dXn+l) + eQgY{Xn,dXn+l)}P{yn,dyn+l) 

at time n + I. 



3.4. Some comments 

In the example in Section 3.2 we attempt to use some measure of information, through g, 
to assist the resampling. The example of Section 3.3 provides a way to control the infor- 
mation that is provided by the approximation S^ . That is, the kernel Qgi', via a and 
the possible rejection, will provide a criterion to check the consistency with the target 
of the value drawn from S^ . This may help improve estimation, if S^ converges slowly. 
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Note that the algorithm is related to, but less sophisticated than, that of [23]. This is 
because we do not consider exchanges to occur between states in equi-energy rings. 

It should be remarked that similar kernels are investigated in [7]. The author deduces 
that for a toy example it is hard to justify the use of such adaptive methods. However, 
a potential criticism of that study is that it is for a unimodal target; 'advanced' methods 
are seldom necessary for such scenarios. This is discussed further in Section 7.3. 



3.5. Algorithm 

The algorithm is (with the appropriate $(/i) or Q 



i-i-i 



0. (Initialization): Set n = and Xq = x, Iq = J/, S^ = 5y. 

1. (Iteration): Set n = n + 1, simulate F„ ^ P(y„_i, •) and X„ ^ KgY (X„_i, •). 

2. (Update): Sl = S^_^ + ;7^[(5y„ - S'^_i] and return to 1. 

4. Assumptions 

We now seek to prove an SLLN for the nonlinear MCMC algorithm described 
in Section 3.3. Recall that we simulate a stochastic process on {{E x EY*,{8 ® 
£)®^^,{Xn,Yn}n>a, {^«}n>o, IP(x,y) ) , {x,y) (E E X E , with finite-dimensional law: 

^{x,y)Ad-i^o,yo, ■ ■ ■ ,Xn,yn)) = 6{x,y}{d{xo,yo)) J| K gy {xi, dx^+i)P{yi, dyi+i) . 

?=0 

Note that the natural filtration is denoted as Qn = Gn'^ for notational simplicity. Since 
{Yn} is generated independently of {X„}, we denote the probability law of the Markov 
chain {Yn} as Qy. Note, again, that the proofs are given in the Appendices. 

4.1. Assumptions 

Our assumptions on K, used to define our process, are now given. For M G M+, the 
notation ^m{E) = {/x G ^{E):^{V) < M} is adopted, with V defined below. In the 
remainder of the paper we say that a set C C E is (l,6')-small if it satisfies a 1-step 
minorization condition, with parameter 6 £ (0, 1). 

(Al) Stability of K. 

(i) [Invariance and irreducihility) . K : E -^ ^{E) is a 7r-invariant and i/i-irre- 

ducible Markov kernel, 
(ii) (One-step minorization on level sets). Define Cd := {x G E:V{x) < d} for 
any d G (l,oo). We assume that for any d > 1, Cd is (l,6'd)-small for some 
0dG(O,l) andudGI^iE). 
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(iii) {One-step drift condition). There exist V:E — > [l,oo) such that 
hm|3;|_j.oo y{x) = oo, A < 1, 6 < oo, C £ £ such that for any x £ E 

KV{x)<XV{x)+blc{x). 

(A2) Stability of P. 

(i) (W-uniform ergodicity). P : E -^ J^{E) is an 77-invariant Markov kerneL 
Furthermore, there exists W:E^ [l,oo) such that P is a VK-uniformly 
ergodic Markov transition kernel with a one-step drift condition and one- 
step minorization condition. In addition, there exists an r* G (0, 1] such that 
V e ^w-' (where V:E^[\,oo) is defined in (Al)(iii)). 

(A3) State-space constraint 

{E, £) is Pohsh. 

4.2. Discussion of the assumptions 

Our proofs of the SLLN wih rely upon a martingale approximation via the solution of 
the Poisson equation (e.g., [17]). For any M < 00, (Al) will allow us to establish a drift 
condition for the kernel K^ that is uniform in /i € ^^(£'); see [5]. In turn, one can 
establish: the existence of a solution to Poisson's equation, the existence of an invariant 
measure a;(/x) for K^^ and regularity properties uniform in /x G ^]^j{E). Then, due to (A2), 
the following facts are exploited: {S^{V)} is Q.y-a.s. finite and given {S^{V)}, {Xn} 
is a Markov chain. (Al) and (A. 2) appear quite strong, but can be verified in some 
important cases such as for random walk Metropolis kernels; see [21], for example. 

A key result, relying on both (A2) and (A3), which is of interest in itself, is that of 
the Qj,-a.s. convergence of ^-statistics of {Yi}. This result will enable us to show that, 
Qy-a.s., ij^iSj) -^uj{r]); this is needed for our proof. 



5. Common properties of K^ 

Using standard drift and minorization conditions, the existence of an invariant probability 
measure is established for any /i G ^oo{E) under (Al). 

Proposition 5.1. Assume (^Alj. Let eg (0,1) as in (2.2), Me (0,oo), then for d> 
eA//[(l - e)(l - A)] with A and b as in (Al)(in); 

1. There exist {9'^,Vd) e (0,1) x 3^{E) such that for any p, e ^m{E) and {x,A) e 
Ex£: 

K^(x,A)>lcA^)e'^'^d{A), 

K^Vix)<~XV{x)+bIcAx) 

with A = (1 - e)A + e -H ^ < 1, 6 = (1 - e)[Arf + 5] + e[M + d] . 
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2. There exists a function tu : ^oa{E) -^ I^oo{E), such that for any fi G I^oo{E) 

3. There exist constants, p G (0, 1) and M < oo depending upon AI , e, A, 6, V , d, 6d 
(as defined in equation (2.2) and [Al)), such that for any fi G ^]^:j{E), r G (0, 1] 
and f G .Sffv 

\K"^{f)-u{f^){f)\v.<M\f\v.p\ 

Some continuity properties associated with the invariant measures are as follows. 

Proposition 5.2. Assume ('AlJ and let M G (0,oo). Then there exists M < cxd (depend- 
ing solely on M and the constants in ('Al^^ such that for any r G (0, 1], /x,^ G 3^]^j{E), 

\\lu{0 - Lo{p)\\vr < M\\\K^ - K^lWvr. 

Noting that for any fi,^ G :^{E) and r G [0,1], \\\K^ - Kf,\\\vr = e\\\Q^ - Qf,\\\v- we 
establish global Lipschitz continuity results for fi h- > Q^, which, together with the result 
above, will allow us to deduce uniform Lipschitz continuity of /i — )■ K^ on ^j^ (E) for 
any M G (0, oo). This is to be used in the proofs of many of the subsequent results. 

Proposition 5.3. Let p,^& S^ooiE), then for any r & {0,1]: 

|||Q,.-Qdlk-<2||/i-ei|y. 

6. Law of large numbers 

6.1. Main result 

Our main result is the following SLLN. 

Theorem 6.1. Assume (Al)-(A3). Let r G [0, 1). Then for any f G ^v, {x,y) eE x E 

The proof is detailed in Appendix B, but we outline its main steps below. 

6.2. Strategy of the proof 

The strategy of the proof is now outlined. Introduce the following sequence of probability 
distributions {S*^ := l/{n + i)J27=o^(^i')}n>Oi where w(/i) is the invariant measure 
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of K^ (which, if /i = 5'^, exists Q^-a.s.). This distribution can be used as a re-centering 
term in the fohowing decomposition, 

^,f (/) - <f) = S^if) - S^if) + 5,^(/) - 7r(/). (6.1) 

Let ^ G {S]( (f)} and assume, for now, the almost sure existence of a solution /^ to 
Poisson's equation, that is, such that for any x € E 

Then, the first term on the right-hand side of (6.1) can be rewritten as 

n 

(n + 1)[S^ - S:]{f) = M„+i + ^ [fs.JX„,+,) - /5j;(X,„+i)] 

m=0 

(6-2) 
where 

n-l 

is such that {Mn,Qn} wih be a martingale conditional upon Q^. In addition, critical to 
our analysis, will be that, Q^-a.s., {S^ {V)} is finite. This latter fact will enable us to 
control the various terms in (6.2) on events of the type {supj.>g Sj (V) < M} for M > 0. 
This is now elaborated. 



6.3. {Mm} is Lp-bounded 

One can establish the following uniform in time Lp-bounds of the solution to Poisson's 
equation and the sequence {Mn}, restricted to events {supf.yQ S^ {V) < M} for any 
M > 0. 

Proposition 6.2. Assume (^Al j. Let r G [0, 1], p G [1, 1/r] for r y^O and p>l otherwise 
and M g (0,oo). Then there exists M < cxd such that for any f G ^v , {x,y) (Iz E x E 
and any tti G No , 

E(,,,)[|/5.(X„+i)ri^,„p^^^^s.(^)<,-,j]^/P < MVixy. 

Proposition 6.3. Assume (Al). Let r G [0, 1], p G [1, 1/r] for r y^O and p>l otherwise 
and M G (0,oo). Then there exists M < cxd such that for any f G J^v^ , {x,y) G E x E 
and any to G No, 

E(,,,)[|M„rI^,„p^^^5.(^)<M}]'/"<™'/'^'/''My(x)^ 
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This result will allow us to prove the P(^ j^-j-a.s. convergence of Mn to zero (of. Ap- 
pendix B). 

6.4. Smoothness of the solution to Poisson's equation and uj{S^) 

As can be observed in (6.2), we have to control the fluctuations of the solution of the Pois- 
son equation {fgY [Xm+i) — fs^ (^m+i)}- Also, in (6.1), the convergence of ijj{S}^){f) 
to uj{r]){f) Qj,-a.s. must be established. Both of these issues are now dealt with. 

Proposition 6.4. Assume (Al) and (A2). Let r e [0, 1), then for any f G ^y, {x,y) £ 
ExE 

lim \fsY {X^+i)-fsY{X,n+i)\=0 P(^,y)-a.s. 

Proposition 6.5. Assume (A1)-(A3). Let f G ^y and {x,y) ^ E x E, then 

\unLu{Sl){f)=io{T^){f) Qy-a.s. 

7. Examples 

In this section we present some applications of our algorithms. Specifically, it is demon- 
strated that the assumptions hold in some very general scenarios. In addition, a numerical 
investigation of our approach for a toy problem is given. 

7.1. Verifying the assumptions 

It is now shown that it is possible to verify the assumptions in Section 4.1 in quite general 
scenarios. Let us concentrate upon the case where, for fc > 1, {E, £) ~ {S.^ ,ii§{R^)) and K 
(resp., P ~ recall the invariant measure is 77) is a symmetric random walk Metropolis 
kernel: 

K{Xt(1x') ^ aT^{x^x')qT,{x — x')Ax' + 5x{<^x')<l ~ I a.,r{xTx')qT,{x — x')Ax' >^ (7-1) 

I Jw' } 

where (resp., P) 

, t:{x') 

a^r (x, X ) = 1 A — ^— 

■jt{x) 

and Qtt (resp. q,,) is a symmetric density (w.r.t. Lebesgue measure). 
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7.1.1. Assumptions 

A set of general conditions is introduced, such that the assumptions in Section 4.1 will 
hold. 

(Ml) Density n. 

• TT admits a positive and continuous density w.r.t. Lebesgue measure. 
(M2) Definition of rj. 

• ri(x) oc 7r(x)", with a € (0, 1). 
(M3) Boundedness. 

• TT is upper bounded and bounded away from on compact sets. 
(M4) Super- exponential densities. 

• TT is super exponential: 

X 

lim ■; — r • Vlog(7r(a;)) = — oo. 

(M5) Regularity of contours. 

• The contours of n are asymptotically regular: 

X \7n{x) 
limsup -- • < 0. 

|:r|_>+oo F V7r(a;) 



(M6) Lower bounds on q^, , q,, 

Both g^r and q^, are 
(resp., Eq^ > 0) such that 



• Both g^r and g^ are such that there exists Sq^ > (resp., Sq^ > 0) and e,^ > 



Qnix) > eq^ for |a;| < 5q^ 

(resp., q^{x) > eq^ for |a;| < 6qJ. 

7.2. Result 

Proposition 7.1. Assume (M1)-(M6), then (A1)-(A3) hold for any r* e (0,1) with 

W{x)^ ^ , S,„G(0,1), 

y(x) = —7^ , Sye{0,r*as.ui). 

[7r(2;) J 

The proof is in Appendix F. 
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1.2.1. Some comments 

The conditions presented above are quite general. For example, they are satisfied if tt is 
a mixture of normals. More generally, it may be difficult to check the assumptions, but 
this is due to the underlying nature of the geometric ergodicity assumptions; see [21] for 
more thorough investigations. 

7.3. Toy example 

Our target distribution is 

t:{x) = 0.4^(a;; 0, 0.5) + Q.Qip{x] 17.5, 1) 

with ^{x] fi^cr'^) the normal density of mean ^ and variance a^ . 

Our algorithms are run with K as a random walk Metropolis kernel with normal 
random walk proposal density. The kernel is iterated 500 times (i.e., K = K with K as 
a random walk Metropolis kernel); this is to reduce the amount of interaction, especially 
for large e. r] was taken to be: 

7;(a;) oc 7r(x)"-^^ 

The algorithms were run for the same CPU time and the results can be found, for 50 
runs of the algorithm, in Table 1. The assumptions (M1)-(M6) are satisfied here. 

In Table 1, the algorithms in Sections 3.2 and 3.3 both perform reasonably well for 
small values of e. As expected, from the assumptions, as e gets larger the accuracy falls. 
This is due to the fact that the amount of auxiliary information that can enter into the 
{X„} process is increased. For small e, the example in Section 3.3 appears to work better 
(more accurate estimation) due to the more sophisticated interaction with the auxiliary 
chain. The drastic poor performance for the kernel in Section 3.3, for large e, is due to 
the fact that no transition occurs after the swapping move. 

To compare to the results of [7] , we ran a random walk algorithm for 1 million iterations 
50 times and a nonlinear algorithm (Section 3.3). The nonlinear algorithm was run with 
e ~ 0.01 but the random walk kernel was not iterated. The auxiliary chain was run 
with a = 0.75 (as in (M2)). This was run for 110 000 iterations 50 times (which is 
approximately the same CPU time as for the random walk Metropolis algorithm) . Both 

Table 1. Estimates from mixture comparison for nonlinear MCMC. The estimates are for the 
expectation of X\ the true value is 10.5. Each algorithm is run 50 times for 2 million iterations 
after a 50 000 iteration burn-in (Section 3.3; the simulations for Section 3.2 are adjusted for the 
appropriate CPU time). The brackets are ±2 standard deviations across the repeats 

Example e = 0.05 e = 0.25 e = 0.5 e = 0.75 e = 0.95 

Section 3.2 10.32 (±0.08) 10.74 (±0.12) 10.89 (±0.19) 10.37 (±0.18) 10.99 (±0.20) 
Section 3.3 10.57 (±0.04) 10.52 (±0.09) 10.96 (±0.7) 10.02 (±0.93) 11.08 (±1.20) 
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algorithms are such that all initial values are drawn from a uniform on [0,10.5]. The 
estimated value for the first moment is 6.93 ± 16.96 (±2 standard deviations, across 
the 50 runs) and 10.41 ± 2.03 for the random walk and nonlinear methods, respectively. 
The random walk algorithm is unable to jump between the modes of the target, while 
the auxiliary chain is able to do so; hence justifying our earlier intuition. This slightly 
contradicts the 'cautionary tale' in [7] as it illustrates that such algorithms are potentially 
useful in cases where random walk algorithms do not work well. Wc remark however, 
that one must be careful with allowing too much auxiliary information to enter the chain 
{X„}„>o; this can lead to poor results. This is consistent with Proposition 5.1, which 
indicates that d grows as e goes to 1. 

8. Summary 

We have investigated a new approach to stochastic simulation: Nonlinear MCMC via 
auxiliary/self-interacting approximations. Convergence results for several algorithms 
were established and the algorithm was demonstrated on a toy example. As extensions 
to our ideas, the following may be considered. 

First, the conditions required for convergence may be relaxed. For example, [17] es- 
tablishes weaker-than-geometric ergodicity assumptions for the solution to the Poisson 
equation and functional central limit theorem; also, [15] establishes drift conditions for 
polynomial ergodicity. It would be of interest to sec whether such conditions would be 
sufficient for the convergence of our algorithms; see [28] for proofs for parametric adaptive 
MCMC. 

Second, it would be interesting to design more elaborate methods to control the evo- 
lution of the empirical measure. In our current algorithms, the empirical measure is only 
updated through the addition of simulated points. It may enhance the algorithm to in- 
troduce some mechanisms allowing the improvement of this quantity; for example, we 
could introduce a death process with a rate associated with the un-normalizcd target 
distribution. 



Appendix A: Common properties of K 



(J- 



Proof of Proposition 5.1. The second and third statement of the proposition are a di- 
rect consequence of the first point from [24], Theorem 2.3 (note the 0-irreducibility and 
aperiodicity follow immediately). The minorization property is direct from the expression 
for Kf^ and (Al)(ii) with 6'^^^ {1 ~ e) x Od- Let us focus on the drift condition. 
For any x e E, fi£ ^j^j{E): 

K^{V){x) < (1 - e)[\V{x) + blcA^)] + e[M(^) + V{x)^{x)], 

where •^{x) = 1 — J^a{x,y)fi{dy). Then as fJ.{V) < M, one has 

Kt,{V)ix) < (1 - e)[XV{x) + blc,{x)] + e[M + Vix)]. 
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K^.{V){x) < 



(l-e)A + e + 



eM 



V{x) = \V{x). 



For x&Cd 



Kt.{V){x) < (1 - e)[\d + b] + e[M + d] 



and hence one concludes that 



K^{V){x)<\V{x) + blcAx)- 



D 



Proof of Proposition 5.2. This is a direct apphcation of Proposition 5.1 and Lem- 
ma C.l. D 

Proof of Proposition 5.3. The proof is given for r = 1 only. Let |/| < V: 



[QM-Qd(/)(^)I 



[^^-mu)Hx,u){f{u)^f{x)}] 



E 



Now it is clear that, for any fixed x £ E: 

\aix,u){f{u) - f{x)}\ <[V{u) + V{x)l 

i.e., 

\aix,u){f{u) - fix)}\<2Viu)Vix). 
Thus 

\[Q^-Qi]{f){x)\<2V{x)\\^i-av 
and then the result easily follows. 

Appendix B: Proof of the main result 



n 



Proof of Theorem 6.1. Let r E [0,1) and / G Cy^- ■ Recall the strategy of the proof 
outlined in Section 6.2, which relies on the decomposition: 



-iX 



S^if)-7r{f)^S^if)-S!:if) + S:if)~nif) 



(B.l) 



with 



{n + l)[S'^-S-,]{f) 



m=0 
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where {M„} is a martingale conditional upon Q^. Proving the almost sure convergence 
of [S^ — S!^]{f) relics on classical arguments. For any n>l, 5 > and M G (0, oo), 



%Jsup\[S^-S^]{f)\>S 



< 



\,,y) sup \Mk+i/ik + 1)1 > S/3,supSl iV) < M 

^k>n fc>0 



(x,y) sup 
V k>n 



Z2 [•^■S'^ + i(^™+l) ^ fs^i^m+l)] 



(fc + 1) > (5/3, sup 5^^ (V) < M 

k>0 , 



+ P(x.,) sup[|/5.(Xo)| + \fs^ {Xk+i)\]/{k + 1) > S/3,supSliV) < M 



}y (sup Sl{V)>M 



Let e > 0. By assumption there exists M > such that Qy{supj^^Q S^ {V) > M) < e/4. 
Now we consider the remaining terms on the right-hand side of the above equation from 
bottom to top; it is proved that there exists no > such that for any n > uq each of 
these terms is less than e/4. Let p G (1, 1/r). By Proposition 6.2, one can apply Markov's 
inequality and a Borel-Cantelli argument to show that the term on the third line vanishes 
as n — >■ cxD. By Proposition 6.4 and a Cesaro argument one concludes that the term on 
the second line goes to zero as n -^ oo. The term dependent on {M„} is dealt with by 
using an adaptation of a Birnbaum-Marshall inequality (see [5]) for p G (1, 1/r). 
Controlling the bias term requires a more novel approach. Note that 



\s:{f)-Hf)\ = 



1 



1 



J2HSr)-u;{rjm) 



as uj(r]) = TT in our setup. In Proposition 6.5 it is proved that under our assumptions 
[uj{SY) — uj{r])]{f) — >■ Qy-a.s. as i — >■ oo. We conclude by invoking a Cesaro average 
argument. D 



Proof of Proposition 6.2. Let M g (0,oo). The proof begins by conditioning upon 
the filtration Q^ generated by the auxiliary process {Yn}; then, using the uniform in 
/z G 3^j^{E), geometric ergodicity is proved in Proposition 5.1. As a result, there exists 
an M < 00 such that 



^(x,y)[\fs:^{Xm+l)\''l{sup^~^o Si' {V)<AI} 



|1/P 



<MV''{x), 

where we have used Jensen and the uniform drift condition on the set {sup;^,>Q S^ {V) < 
M} proved in Proposition 5.1 D 
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Proof of Proposition 6.3. We follow a similar argument to that of [5], Proposition 6. 
Throughout, denote by Bp a generic constant dependent upon p only. Also recall pr <1. 
The proof begins by applying the Biirkholder-Davis inequality (see, e.g., [30], pages 499- 
500), which yields for p > 1 



^x,v)[\Mn\n^snp^^aSl{V)<M}'i 



1/p 



<BpEy 



E 



(2-,y) 



Y, [/S- (^m+l) - Ks.{fsl)iX^)]' 



p/2 



ni/p 



{^^Pk>a Sl{V)<M} 



In the case p > 2, by similar manipulations to those featured in [5] 

E(,,,)[|M„ri{,,p^^^ <,.(^)<A-,j]^/^ < n^/'Bj,MV{xy 
In the case p < 2, one may apply the Cp-inequality to yield 



< 



2_,^(^.y)[l.^s^(^™+i) ~^s^.(/s^)(^™)ri{supj^>(,s^(y)<A?] 



jn—0 



i/p 



Application of Minkowski, conditional Jensen and Proposition 6.2 yields 
from which we can conclude. 



D 



Proof of Proposition 6.4. Our proof is based upon the decomposition of Proposi- 
tion C.2 (in Appendix C) and then using the Lipschitz continuity properties proved 
in Propositions 5.2 and 5.3. Let M € (0,oo) be given; suppose that we are on the set 
{supfc>o5fc^(F)<M}.Then 

I /s^,+i (Xm+l) - /si; (^rn+l ) | 
n-1 

= EE[^5-^,-^(^™+i)K^%+,-^ej[^5r"'-^(5^)(/)(^™+i)] (B.2) 

^^[c.(C+l)-^^D](^5V-^(0)(/) ■ 
nSN 

Now, consider the first term. Since, for any m>0, the kernel Kgv satisfies: 
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for some finite M and p G (0, 1) independent of S'^ E ^fj{E), it follows that: 

Then, adopting the continuity result for Kqy : 

\\\K^-K^\\\v^-<2y-^\\v. 
for any /i,^ G ^oo{E), it follows that: 

Since 11^^+1 - SlWvr < [V{Y,r.+ir + Sl{Vn]/{m + 2) 



(1 — p)^ 771 + 2 

Turning to the second sum on the right-hand side of (B.2), using the continuity result 



\\iu{p)-u;{mv<M\\\K^-K^\\\v^ 

(for M < oo not depending on /i,^ G ^j^j{E) by Proposition 5.3) and the continuity of 
the kernel Kf^ (Lemma C.l ) yields, 

MSl^,)~u.{Sl)]{K-.~u.{Slm)\ < Mp"E^^-t-?^ia, 

from which we obtain a similar bound for the second sum on the right-hand side of (B.2). 
We now estabhsh an Lp-bound, for p > 1 of this upper bound on {supj.>g S'^ {V) < M}, 
which will allow us to use a Borel-Cantelli argument to complete the proof. Note that it 
is naturally sufficient to consider m+V V(Ym+iY on {supj.>o ^'^'(T^) < M}, and we 
focus on 

= E(,,,)[E(,,,)[y(X™+i)'-P|(7^]y(y™+i)'^PI{,,p^^^<,.(v.)<M}]'/^ (B.3) 

where we have used that, conditional upon Q^ and on the event {sup;j>Q S"^ {V) < M}, 
the following bound holds K(^^^y-)[V{X,n+iY^\S'^]^^^ ^ MoV{xY for some deterministic 
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constant Mq depending only on M and the parameters of the drift condition in (Al). 
Similarly, M > Mq only depends on M and the parameters of the drift conditions in 
(Al)-(A2). With p > 1 wc conclude that 

7n—0 ^ ^ — ) / 

The result then fohows by using for any 5 > the bound, 

P(.,.)(sup I/5V (X,+i) - 4.(X,+i)| > 8) 

< Qy (supSliV) > m) + P(,,y) (sup l/.v (Xfe+i) - fsj {Xk+i)\ > 5,supSliV) < M 

and using the fact that for any e > one can find an M large enough to ensure that the 
first term on the right-hand side is less than e/2 and then ttiq such that for any m > mo 
the second term on the right-hand side is also upper bounded by e/2. D 

Proof of Proposition 6.5. Note first that for any i,j G N, / G Cv and x G E such that 
V{x) < oo wc have the following bound, 

< HSrm - KlAJ){x)\ + \^Af){^) - im){^)\ + |if^(/)(x) -^(r,)(/)|. 

Let e,(5 > and M > r/(l/) be such that Qy(supfe>o Sl{V) > M) < e/4. On the event 
{supfc>o S^{V) < M} we have by Proposition 5.2 the existence of M < +oo and p G [0, 1) 
(independent of i) such that the first and last terms on the right-hand side arc bounded 
by Mp^ . We can therefore fix to such that 

Q,( sup \ojiSY)if)-KlAf)i^)\ + \K^M){^)-^iv)if)\>S/2,snpSliV)<M) < e/2. 
Now from Lemma D.2 one may conclude that there exists toq > such that for any 

TO > TTiQ 



(Q,(sup|A'^,(/)(a;) -i^^(/)(x)| >V2,sup^|'(F) < Af) < e/4. 

H>m * fe>0 ' 

The proof is completed by noting that the results above imply that for m > mo , 

Qy(snpMSY)-Luir^)]{f)\>s) < e. (B.4) 



1006 



Andrieu, Jasra, Doucet and Del Moral 



Appendix C: Standard technical results on Markov 
chains 

Lemma C.l. Let {E,£) be a measurable space, b < oo, A G (0, 1) and C £ £. Then for 
any Markov transition probabilities Pi,P2 '■ E — > 0^[E) satisfying for (x, A) Cz E x £ and 
i = l,2, 



P^V{x) <XV{x)+lc{x)b, 
n{x,A)>Ic{x)eD{A). 



(C.l) 
(C.2) 



There exist M{-) < oo, p G [0,1), invariant probability measures 7ri,7r2 G ^{E) (corre- 
sponding to Pi and P2, respectively) , such that for any n > 1, r G [0, 1] and any |/| < y 



for any n>l, 



and 



\[Pl - ^i](/)|v'- V \[P^ - 7r2](/)|y. < M{r)fr 
\\\P^-Pn\vr<M{r)\\\Pi-P2\\\vr 



\W-7^2\\vr<M{r)\\\Pi-P2lvr. 
Proof. Let r G [0, 1] and / G J^v^ ■ Wc have the following decomposition: 



[Pl-P2]{f)\ 



Y.pim-P2]{[pr'-'~^2]{f)}) 



1=0 



For any |/| < V^, in a similar manner to Proposition 3 of [5]: 



\[P^-Pm)\<Mir)Y,p^-^-'P^i\\Pi-P2\\v.) 



i=0 



^^.r.-,-lp^l \\Pl-P2\\v' y, 



i=0 



yr 



<Mir)\\\Pi-P2\\\v.Y.p-^-'PUvn- 

i=0 

From the drift condition (A2) and conditional Jensen one can bound P^V^ by [A + 5/(1 — 
X)YV{xY for r G [0,1] and hence conclude that: 

|[Pr-P^1(/)|<M(r)|||Pi-P2|||'y. 

Since the right-hand side is independent of n, the inequality holds in the limit and hence, 
by y-uniform ergodicity, the result. D 
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Proposition C.2. Assume (kl). Then, for r G [0, 1], ^,ne ^oo{E), f e ^yr- we have 
the following decomposition for the differences in the solution to the Poisson equation: 



neN I 4=0 

-KO-c.(A.)]([i^;:-c.(/.)](/))|. 

Proof. Adopting the resolvent solution to the Poisson equation (which exists under our 
assumptions), we have 

hi^) - U^) = E im-^imfK^)) - i[K;-u;i^^)]if){x))] 

neNo 

"n-l 



J2 E^I([^«-^M]{[^r'"'-^(/^)](/)})w+'^(A')(/)-'^(0(/) 

neNLi=0 

{Tl-1 
J2{[Kl-co{0]{Ke^K,){[K--^''^u;{f,)]{f)}{x)) 
i=0 



since 



n-l 



Y,u;iO[K^-K,]iK;-'^-\f))^~u;iOif-K;if)). 



n 



Appendix D: Convergence of the iterates 

The main result of this section is Lemma D.2, where it is established that for any g > 1, 

lim \[Kl^ - K:^]{f){x)\ = 0, Q^-a.s., (D.l) 

with Kfj^ as in (2.2). The proof consists of showing that K^{f){x) can be rewritten 
as fi'^'^i'g) for some function 'g-.E'^—^'R to be given below. We will then use results 
from Appendix E, associated with ^-statistics for an appropriate class of functions, to 
complete our argument. 

Introduce the following family of Markov transition probabilities on {E x E,£ (^ £), 
indexed by zi G E, 
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r^i((wo,Wo);d(wi,u;i)) 

:= (1 - e)K{wo,Awi)5w„{dw[) 

For any wo^Wq € E and z := (zi,...,Zg) £ i?*?, wc define the iterates of this family of 
kernels as follows: for fc = 2, . . . , g and any / G ^v, 

where for any x,x' eE, (/ ® l){x,x') := f{x). Let z := {zi,...,Zq) e i?*. Following an 
argument identical to that developed in the proof of Lemma D.2 it is possible to show 
that for any k — l,...,q T^^ ^^{f (^1){wo,w'q) belongs to ^v^ ^ where for w,w' G E, 

k 
V.,,...,., {w, w') := V{w) + V{w') + Y, V{z,). 

Proposition D.l. Assume ('AlJ. For any q > 1, {zi,...,Zq) G E'^ , fi G ^ao{E), f € ^v , 
x^x' £ E we have that 

Kl{f){x)^ f T,^^_,^(/®l)(x,x'V«^(d(zi,...,z,)). 

JEi 

Proof. The result is proved by induction. One immediately checks that for any zi G -E, 
f e^v, wo,w'„ eE, 

T,M ® l)K,«^o) = (1 - e)-f^(/)K) + e[aK, 2i)/(^i) + (1 - a^, zi))/K)], 
and hence 

KTzAf®^){w^,0)= I T,,if ®l)iwo,w'o)KAzi) = K^{f){wo). 

JE 

Now assume the property is true for fc — 1 > 1 . Then 

as required. D 

Now, to establish (D.l) wc need to show that T| ^ (/)('"^OiWo) ^^'^^ within the class 
of functions for which Lemma E.2 applies; this is proved below. 

Lemma D.2. Assume (A1)-(A3). Let q>l be fixed and f G ^v- Then for any x € E 
lim |[/v|, - K^j]if){x)\ = Qy-a.s. 
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Proof. Our objective is to use the representation established in Proposition D.l along 
with the result in Lemma E.2. To that end we show that for any / £ ^v , then T^^ z {f ® 
1){wo,w'q)€^^uu z('?) = (zi,...,z,), where r«^_...^,^(/®l)(u;o,w'o) is as in (0.2).' The 
result can be proved by induction. Now, for any fc = 1, . . . , g, Wk-i, w^-i £ E and z = 

(zi,...,z,)6£;« 



Tzk{^z(t)){wk-i,w'k_i) :=(l-e) 



K{V){wk- 



V{wu- 



-E^(^^) 



+ e< a(wfe_i,Zfe) 



l/(u;fe_i) + l/(zfe)+^V^(z,) 



(1 -a{wo,zi)) 



i/K_i) + yK-i) + E^(^^) 



Since there exists M < oo such that for any x G E, K{V){x) < MV{x) we conclude that 
there exists Ci > such that for any k = l, . . . ,q, Wk-i,w'f._i G E and z*^'' G -E"^ 



r^fc (V;,(,) )(Wfc_i , w'f,_i) < CiV^M {Wk-l,w'f,_i). 



(D.3) 



This implies that for any g G ^v ) then Tz^{g){wk-iT'w'f._i) G ^v ( >■ Now wc can 
proceed with the induction. Assume that for some fc -- 1 > 1 , if g G ^v i then 

^':.'.,..-i (ff)(«^' «^') 6 "^v^(,) • Then by definition 



7^1.....(/®i)K,^o) = r,t'.,.. 



,_,m.(/®i)(-)}K,^«;) 



and the induction follows. Now, for any fixed Wo, Wq one has that T|^ ^ (/® l)('"'o,'*«o) ^ 
^^;i/(,) and the result follows from Lemma E.2. D 

Appendix E: Results on U and V-statistics for 
Markov chains 



Let {E, S) be a Polish space and rj £ {^i € 3^ {E) : ^i{W) < oo}. Denote n = E^ a.nd J^ = 
£;«iN g^j^jj consider a time-homogeneous Markov chain {Xn}n>o with transition kernel P 
such that 77P = 77 with Xq = a;. Denote by P^^ the corresponding probability distribution. 
Note that {X„} should not be confused with the process introduced in Section 3.5. 

For any sequence {Zn}, Zn £ E, any q gN and f : E"^ — > R, denote for any n > 1 the 
associated ^-statistic 



STzif) 



1 



(n + l)9 



E /(^^ 



(I)'- 



. ■^i3(«))i 



(E.l) 



i3e(g.n+l) 



where (q, n + 1) is the set of all mappings of {0, . . . , g — 1} into {0, . . . , n}. 
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The main result of this section is Lemma E.2, where it is shown that under additional 
assumptions on P and /, that 

lini5«l(/)=^«^(/), 

Pjj-a.s. The proof relics on a coupling argument with another Markov chain {Yn}n>o de- 
fined on {ri^J-) with the same transition P, but initialized at stationarity, that is, Iq ~ rj. 
P^ denotes the corresponding probability distribution. 

The conditions on {X„}„>o and {Yn}n>o referred to above are given in (A2), and 
will, in particular, imply geometric ergodicity. The class of functions to which our results 
apply is defined as follows. Let (W^'")('?)(x('?)) := ELi ^^(^^0'' for any r e (0,1), a;(«) := 
(xi, . . . ,Xq) G E'^; we will consider below the following class of functions 

J^iwn^.r.= {femE'^: sup \f{x^'J^)\/{Wn^''\x^''^)<oo}. 

For any sequence {Zn} , Zn G E , any q eN and / : i?"* — > R denote for any n > 1 the 
associated [/-statistic 

Cz(/) = 7;^^ E /(^^(i),---,^.(,)), (E.2) 

where (g, n -I- 1) is the set of one-to-one mappings from {0, . . . ,q — 1} into {0, . . . , n} and 
nq := n\/{n — q)\. A prehminary result on [/-statistics is first established, based on the 
aforementioned coupling. 

Proposition E.l. Assume (A2) and (A3). Let {X„}n>o and {Yn}n>a be as defined 
above. Then for any q eN, r E [0,1), / G Ci'iy'-)(i') o.'^d x G -E, there exists a coupling 
{Xn,Yn}n>f) on some probability space (D, x 17, J^® J^,P), such that 

lim \S%{f) - S%{f)\ = ¥-a.s. 

Proof. Let P^"^ (resp., P^"^) denote the law of (X„,X„+i, . . .) (resp., (y„,y„+i, . . .)). 
Then, convergence in total variation of the processes is sufficient to imply that: 

lim ||P(")-P(") 11^^ = 0. 

By Theorem 2.1 of Goldstein [19] the coupling exists; that is, there is a probability space 
{n X rj, J"(g) J^,P) such that V{n X ■)=Vxi-) and P(- x Q) =P^{-) (note the dependence 
on a; of P is omitted for notational simplicity). The process on this space is written 
{Xn, Yn}n>o ^ud T is thc associated coupling time. Choose g G N. For any 5 > 0, M eN, 
n> M \/ q, one has that 

P(sup|5®^(/) - S'^lif)] >6)< P(sup l^f J(/) - 5®^(/)| >S,T< m) +P(r > M) 
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with S®''^{f) as defined in (E.2). Now let e > be given and choose M such that P(T > 
M) < e/2. The first term on the right-hand side of the above inequality is now dealt 
with: 

p(supl5®^(/)-5®|,(/)|>5,r<M)=^p(snp|5°|(/)-5°^(/)|>5,T=/). (E.3) 



(=1 



Then, on the event {T = /}, one has that the terms involved in the definitions of S ^^{f) 



and 5'®|,(/) only differ for ^'s such that d{i) e{0,...,l-l} for some ie{l,...,q}. For 
any fc > m > 0, introduce the subset of {q, fc + 1) 

S,„,fc := {i9 e (<7, fc + 1) : 3i e {1, . . . , q} s.t. i?(i) < m}. 

Then for any le{l,.. ■,M}, with X^j^) = X^(^,-^ and ^^(i) = i^^(i)I{^(i)</} +^tf(i)I{^(.i)>i} 
and the notation 



Aif)x,YiHl),---,^iq))-=f{Mi),---,X^(g))-fi%W,---,%ig)), 



we have 



^(sup|5°|(/)-5®?(/)|>5,T = i) 

Y, A(/k^5.Wl),...,^(g)) 



1 



= F sup -— 

Let us denote for l,n ^N such that n > I 

1 



>S,T = l 



Ai^n--= i sup 



^ A(/);,,^Wl),...,l?(q)) 



?eH,,fc 



>5 



It is now shown that ¥{Ai^n) vanishes as n -> oo. which in turn will prove that the 
above vanishes as well for any / e {1,...,A/}. Since / G ^(^i/ru,), there exists some 
(deterministic) constant M < oo such that 



~ ~ M 

P(An)<P sup--— 
\k>n \k + l)q 



J2 {J2i^(-M^)y+w{%ir)y 



i9GE 



>s 



Consequently 



nAn) < p. sup 



M 



k>n (fc + l)q 
M 



Y, <^£[M/(x^(,)r+Ty(x^(,)ri 



i3eE 



, fc> 



n{k+l), 



E {E^iymY^ 



{,?(»)</} 



iJeHi.fe ki=l 



{^(j)>i}J 



>5/2 



>6/2 
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The drift condition on P yields the classical result supj>Q{E^[W^(Xi)] VE^[M^(li)]} < oo. 
Note in addition that the cardinality of Sj^^. is 

Hence one may use an Lp-proof similar to that in Proposition 6.4, with p G (1,1/r) 
along with a Borel-Cantelli argument via Markov's inequality, to conclude that 
lim„_>ooIP(^i,n) = 0. This allows us to complete the proof by choosing n such that each 
of the M terms in the summation (E.3) is less than e/2M. 

n 

Lemma E.2. Assume (A2) and (A3). Let g G N, re [0,1), / G Jf(^r-)(,), x e E and 
{Xi} be as defined earlier. Then, 

n— >oo ' 

Proof. The idea of the proof is to use the almost sure convergence results for [/-statistics 
of ergodic stationary processes established in [1]. In order to achieve this, the coupling P 
introduced in Proposition E.l is utilized. In particular, for any S > 0, consider the fol- 
lowing upper bound 

Fjsup\[Sf%-v^'^]if)\>6) 

= K'^w^T^ - s!}c^if) + i^!}, - sfW) + [sfi - ,«^](/)i > s) 

< P.(sup |[5«^ - 5,=;^](/)| > 5/3) +P(sup |[55 - S'^l]if)\ > 5/3) 
+ pJsup|[5°^-77«^](/)|>J/3). 

The convergence to zero of terms on the right-hand side of the inequality above from right 
to left are now considered. Since {i^n}n>o is an homogeneous Markov chain, started in 
stationarity, it is a stationary ergodic process. In addition, as / is bounded by integrable 
products, {E,£) is Polish and {Yn}n>o is absolutely regular (or weakly Bernoulli) [14], 
Theorem U of [1] can be invoked; the last term goes to zero (note that the proofs of [1] 
extend to Polish spaces). By Proposition E.l, the second term goes to zero. 

Let us turn to the first term on the right-hand side of the inequality above. We use an 
argument similar to that of Theorem 5.1 of [18]. This uses the following identity 

(n + l)'[5°l-5f;,](/) = [(n + l)^-(n + l),]^«';,(/)- ^ /(X^(,), . . . ,X,(,)), 

■dejq^n+l) 
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where {q,n + l) := {q,n + 1) \ {q,n + 1). Let p G (1,1/r). Since / S ^(vi/'-)(9) ; ^i' s-i^y 
(ii,. . . ,iq) G {0, . . . ,ri}' then by Minkowski's inequahty, foUowed by Jensen's inequahty 
and the fact that via the drift condition svlp^^qExIW {pr){Xi)] < MW{pr){x) for some 
A/<oo 



E,[|/(X,,,...,X,jni/p<||/||(^,^,„^E,[W^(X,,n^/P<A/gll/l|H.(„W^'-(x). 
As a resuh 



1=1 



E.[ld(/)n'/^<Mg||/||^<„Ty'-(x) 



and 



E, 



/ . fiXi){l),---,X^(^g)) 



1/p 



< M[{n + ly -{n + l)Mf\\wi.^W'-ix), 



i?G(g,n+l> 

which allows us to conclude that there exists Cq < go such that for any n> q 

E.[{n + lf\[S^%~S^%]{fm'/^<Cq[{n + lY-{n + l)q]W^{x). 

Now since (n + 1)'^ — (n + 1)^ = 0(n''~^) and p > 1, a Borel-Cantelli argument can be 
used. The proof of the lemma now follows. D 

Appendix F: Verifying the assumptions 



Proof of Proposition 7.1. Verifying many of the assumptions (Al) and (A2) is fairly 
simple and can be found in, for example, [21] (i.e., (Al)(i)(iii) and (A2)). The small-set 
condition (Al)(ii) can easily be proved in a similar way to the proof of Theorem 2.2 in [29] 
and is thus omitted. This leaves us with the latter part of (A2) ((A3) is clearly true here). 
In our case, 



V{x) 



tt{x) 



for any s^ G (0, 1) (see [21], Theorems 4.1 and 4.3). The expression for W{x) 



s^. G(0,1), 



7r(x) 



follows similarly. For the last part of (A2), fix r* ,3^ G (0, 1); then 

V{x) 



\s^—r as^ 



wixy I loo 

which is upper bounded if s^ G {0,r*dsw)- 



T:{xy "" 



n 
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